<html>
<head>
<title>ELKI: Environment for DeveLoping KDD-Applications Supported by Index-Structures</title>
</head>
<body>
<p>ELKI: Environment for DeveLoping KDD-Applications Supported by Index-Structures.</p>

<p>ELKI is a generic framework for a broad range of KDD-applications
and their development.
For background, contact-information, and contributors see
<a href="https://elki-project.github.io" target="_parent">https://elki-project.github.io/</a>.</p>

<p>This is the documentation for version 0.7.5, published as:<br />
<i>
  Erich Schubert and Arthur Zimek:<br />
  ELKI: A large open-source library for data analysis<vr />
  ELKI Release 0.7.5 "Heidelberg"<br />
  CoRR arXiv 1902.03616
</i>
</p>

<h2>Getting started</h2>
<p>The <a href="https://elki-project.github.io/">ELKI website</a> contains additional documentation.
A <a href="tutorial.html">Tutorial</a> exported is included with this documentation and a good place to start.</p>

<h3>Invocation</h3>
<p>To use the KDD-Framework we recommend an executable .jar-file:
<a href="../elki.jar">elki.jar</a>. Since release 0.3 it will by default invoke a minimalistic GUI called MiniGUI when
you call <code>java -jar elki.jar</code>. For command line use (for example for batch processing and scripted operation),
you can get a description of usage by calling <code>java -jar elki.jar KDDCLIApplication -h</code>.</p>
<p>The MiniGUI can also serve as a utility for building command lines, as it will print the full command line to the log window.</p>

<p>For more information on using files and available formats
as data input see {@link de.lmu.ifi.dbs.elki.datasource.parser}. ELKI uses
a whitespace separated vector format by default, but there also is a parser for
ARFF files included that can read most ARFF files (mixing sparse and dense vectors is currently not allowed).</p>

<p>An extensive list of parameters can be browsed <a href="parameters-byclass.html">sorted by class</a>
or <a href="parameters-byopt.html">sorted by option ID</a>.</p>

<p>Some examples of completely parameterized calls for different algorithms are described
at <a href="examplecalls.html">example calls</a>.</p>

<p>A list of related publications, giving details on many implemented algorithms,
can be found in the <a href="references.html">class article references list</a>.</p>

<h2>Workflow - Where Do Which Objects Go?</h2>
<p>The database connection manages reading of input files or databases and provides a
{@link de.lmu.ifi.dbs.elki.database.Database Database}-Object - including index structures - as a virtual database to the KDDTask.
The KDDTask applies a specified algorithm on this database and collects the result from the algorithm.
Finally, KDDTask hands on the obtained result to a {@link de.lmu.ifi.dbs.elki.result.ResultHandler ResultHandler}.
The default-handler is  {@link de.lmu.ifi.dbs.elki.result.ResultWriter ResultWriter}, writing the result to STDOUT or,
if specified, into a file.</p>

<h3>Database and indexing layer</h3>
<p>The database and indexing layer is a key component of ELKI.
This is not just a storage for <code>double[]</code>, as with many other frameworks.
It can store various types of objects, and the integrated index structures provide access to fast
{@link de.lmu.ifi.dbs.elki.database.query.distance.DistanceQuery distance},
{@link de.lmu.ifi.dbs.elki.database.query.similarity.SimilarityQuery similarity},
{@link de.lmu.ifi.dbs.elki.database.query.knn.KNNQuery kNN},
{@link de.lmu.ifi.dbs.elki.database.query.rknn.RKNNQuery RkNN} and
{@link de.lmu.ifi.dbs.elki.database.query.range.RangeQuery range} query methods
for a variety of distance functions.</p>
<p>The standard flow for initializing a database is as depicted here:<br />
<img src="figures/database-initialization.png" width="600" height="341" alt="Database initialization" /><br />
The standard stream-based data sources such as
{@link de.lmu.ifi.dbs.elki.datasource.FileBasedDatabaseConnection FileBasedDatabaseConnection}
will open the stream, feed the contents through a
{@link de.lmu.ifi.dbs.elki.datasource.parser.Parser Parser} to obtain an initial
{@link de.lmu.ifi.dbs.elki.datasource.bundle.MultipleObjectsBundle MultipleObjectsBundle}. This is
a <em>temporary</em> container for the data, which can then be modified by arbitrary
{@link de.lmu.ifi.dbs.elki.datasource.filter.ObjectFilter ObjectFilter}s.<br />
In the end, the
{@link de.lmu.ifi.dbs.elki.datasource.bundle.MultipleObjectsBundle MultipleObjectsBundle}
is bulk-inserted into a {@link de.lmu.ifi.dbs.elki.database.Database Database}, which will then
invoke its {@link de.lmu.ifi.dbs.elki.index.IndexFactory IndexFactory}s to add
{@link de.lmu.ifi.dbs.elki.index.Index Index} instances to the appropriate relations.</p>
<p>When a request for a
{@link de.lmu.ifi.dbs.elki.database.query.distance.DistanceQuery distance},
{@link de.lmu.ifi.dbs.elki.database.query.similarity.SimilarityQuery similarity},
{@link de.lmu.ifi.dbs.elki.database.query.knn.KNNQuery kNN},
{@link de.lmu.ifi.dbs.elki.database.query.rknn.RKNNQuery RkNN} or
{@link de.lmu.ifi.dbs.elki.database.query.range.RangeQuery range} query is received by the database,
it queries all indexes if they have support for this query. If so, an optimized query is returned,
otherwise a linear scan query can be returned unless
{@link de.lmu.ifi.dbs.elki.database.query.DatabaseQuery#HINT_OPTIMIZED_ONLY DatabaseQuery.HINT_OPTIMIZED_ONLY}
was given.</p>
<p>For this optimization to work, you should be using the proper APIs of the
{@link de.lmu.ifi.dbs.elki.database.Database Database} interface or
{@link de.lmu.ifi.dbs.elki.database.QueryUtil QueryUtil} helper where possible, instead of
initializing low level classes such as an explicit linear scan query.</p>
<p>For efficiency, try to instantiate the query only once per algorithm run, and avoid running the
optimization step for every object.</p>

<h2>How to make use of this framework</h2>

<h3>Extension</h3>
To provide new applications one is simply to implement the specified interfaces.
There are interfaces for a broad range of targets of development. Compare the
<a href="overview-tree.html">tree of interfaces</a> to get an overview
concerning the provided interfaces.

<p>A good place to get started is to have a look at some of the existing algorithms,
and see how they are implemented.
For example the {@link de.lmu.ifi.dbs.elki.algorithm.DummyAlgorithm DummyAlgorithm}
while it does not produce any result, will teach you how to perform
k-nearest-neighbor queries properly. It does however have a hard dependency on the
Euclidean distance and the datatypes supported by it. In order to support arbitrary
distance functions, extend the class
{@link de.lmu.ifi.dbs.elki.algorithm.AbstractDistanceBasedAlgorithm AbstractDistanceBasedAlgorithm}
instead. This is another simple example, this time for obtaining a class parameter.</p>

<p>Visit the <a href="https://elki-project.github.io/">ELKI Wiki</a>, which has a
growing amount of documentation. You are also welcome to contribute, of
course!</p>

<h3>Parameterization API</h3>
<p>ELKI is designed for command-line, GUI and Java operation. For command-line and GUI, an extensive
help functionality is provided along with input assistance. Therefore, you should also support the
parameterizable API. The requirements are quite different from regular Java constructors, and cannot
be expressed in terms of a Java API.</p>
<p>For useful error reporting and input assistance in the GUI we <em>need</em> to have more extensive
typing than Java uses (for example we might need numerical constraints) and we also want to be able
to <em>report more than one error at a time</em>. In ELKI 0.4, much of the parameterization was
refactored to static helper classes usually found as a <tt>public static class Parameterizer</tt>
and subclasses of
{@link de.lmu.ifi.dbs.elki.utilities.optionhandling.AbstractParameterizer AbstractParameterizer}.</p>
<p><img src="figures/parameterization.png" width="600" height="213" alt="Parameterization" /></p>
<p>Keep the complexity of Parameterizer classes and constructors invoked by these classes low, since
these may be heavily used during the parameterization step. Postpone any extensive initialization
to the main algorithm invocation step!</p>

</body>
</html>
